targeted relearning attack
Jogging the Memory of Unlearned Model Through Targeted Relearning Attack
Hu, Shengyuan, Fu, Yiwei, Wu, Zhiwei Steven, Smith, Virginia
Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.
2406.13356
Country:
- North America > United States > Indiana > Saint Joseph County > Granger (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Virginia (0.04)
- (4 more...)
Genre:
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.35)
Industry:
Technology: